Advances in dissimilarity-based data visualisation
نویسنده
چکیده
The amount of collected digital information grows with each day. This development is facilitated by advanced technology, allowing for more detailed and more complex measurements. As a result, huge data sets gathered in this way are no longer easily comprehensible for a human person. Dimensionality reduction constitutes an important tool to visualise highly complex data in two dimensions, allowing users to gain insight about the structure of the data. In this thesis, we give an overview on dimensionality reduction and discuss challenges arising in this context. Moreover, we elaborate in detail the capabilities of dimensionality reduction on two exemplary techniques: generative topographic mapping, as a parametric technique, generating a model of the data; and t-distributed stochastic neighbour embedding, as a nonparametric projection technique, based on a cost function. As a core contribution of this thesis, we will discuss four major limitations of dimensionality reduction techniques, and present solutions to overcome these problems. (i) One important requirement for a visualisation technique regards the extensibility to new data: after constructing a mapping for given data, new data should be visualised in a consistent way. While this is readily available for parametric techniques, nonparametric methods are lacking this property, and we present a general approach to provide an explicit mapping. (ii) Another limitation regards the treatment of complex data: in many scenarios classical feature vectors are no longer sufficient, instead, the use of pairwise data relations in the form of similarity or dissimilarity measures becomes mandatory. Some algorithms are not capable of dealing with dissimilarity data, therefore we develop an approach to solve this problem via an implicit vectorial embedding. (iii) The techniques working on (dis-)similarity data suffer from high computational and memory complexity, since the matrix with pairwise relations grows quadratically with the number of data points. To drastically reduce the complexity, the Nyström approximation technique can be applied. In this context, we generalize the Nyström method for arbitrary symmetrical (dis-)similarity matrices, and define a universal framework resulting in linear time algorithms. (iv) Finally, we address the problem of data visualisation being ill-posed, since there are multiple ways to reduce the dimensionality and it is in general not clear which one is the best. Therefore, we follow the metric learning paradigm to focus on dimensions which are important for class separation. We investigate this approach for a parametric technique on the one hand and introduce a general framework for nonparametric techniques on the other hand.
منابع مشابه
Analysis of Multibeam SONAR Data using Dissimilarity Representations
This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain...
متن کاملInteractive visualisation techniques for large time-dependent data sets
Flow visualisation is an attractive topic in data visualisation, offering great challenges for research. Very large data sets must be processed, consisting of multivariate data at large numbers of grid points, often arranged in many time steps. Recently, the steadily increasing performance of computers again has become a driving force for new advances in flow visualisation, especially in techni...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملThe State of the Art in Flow Visualisation: Feature Extraction and Tracking
Flow visualisation is an attractive topic in data visualisation, offering great challenges for research. Very large data sets must be processed, consisting of multivariate data at large numbers of grid points, often arranged in many time steps. Recently, the steadily increasing performance of computers again has become a driving force for new advances in flow visualisation, especially in techni...
متن کاملInteractive and Narrative Data Visualisation for Presentation-based Knowledge Transfer
In recent years, presentation tools such as Apple’s Keynote or Microsoft PowerPoint play an important role in knowledge transfer. Despite the fact that over the last decade we have witnessed various technological advances and new media types, existing presentation tools still mainly support the presenter-driven delivery of static content. On the other hand, research in information visualisation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015